Business Intelligence and Data Mining by Anil K. Maheshwari PhD
Author:Anil K. Maheshwari, PhD [Anil K. Maheshwari, PhD]
Language: eng
Format: epub
Tags: Business Expert Press, Data Analytics, Data Mining, Business Intelligence, Decision Trees, Regression, Neural Networks, Cluster analysis, Association rules.
Published: 2014-12-29T11:06:47+00:00
DECISION TREES
73
4. Split the data into mutually exclusive subsets along the lines of the specific split.
5. Repeat Steps 2 and 3 for each and every leaf node until the stopping criteria is reached.
There are many algorithms for making decision trees. The most popu-
lar ones are C5, CART, and CHAID. They differ on three key elements: 1. Splitting criteria
a. Which variable to use for the first split? How should one deter-
mine the most important variable for the first branch, and sub-
sequently, for each subtree? There are many measures like least
errors, information gain, and Gini coefficient.
b. What values to use for the split? If the variables have continuous values, such as for age or BP, what value-ranges should be used to
make bins?
c. How many branches should be allowed for each node? There
could be binary trees, with just two branches at each node. Or
there could be more branches allowed.
2. Stopping criteria
a. When to stop building the tree? There are two major ways to make that determination. The tree building could be stopped when a
certain depth of the branches has been reached and the tree be-
comes unreadable after that. The tree could also be stopped when
the error level at any node is within predefined tolerable levels.
3. Pruning
a. Prepruning and postpruning: The tree could be trimmed to make
it more balanced and more easily usable. The pruning is often
done after the tree is constructed, to balance out the tree and
improve usability.
In order to increase predictive accuracy, a decision tree may completely fit the training data and make the tree long. It will thus show good accuracy on training data. However, it may not show such good accuracy on test data. The symptoms of an overfitted tree are a tree too deep, with too many branches, some of which may reflect anomalies due to noise or outliers. Thus, the tree should be pruned. There are two approaches to avoid overfitting.
74
BUSINESS INTELLIGENCE AND DATA MINING
- Prepruning means to halt the tree construction early, when
certain criteria are met. The downside is that it is difficult
to decide what criteria to use for halting the construction,
because we do not know what may happen subsequently, if
we keep growing the tree.
- Postpruning: Remove branches or subtrees from a “fully
grown” tree. This method is commonly used. C4.5
algorithm uses a statistical method to estimate the errors
at each node for pruning. A validation set may be used for
pruning as well (Table 5.2).
Table 5.2 Comparing popular decision tree algorithms
Decision Tree C4.5
CART
CHAID
Full name
Iterative
Classification and
Chi-square
Dichotomiser (ID3)
regression trees
automatic
interaction detector
Basic algorithm
hunt’s algorithm
hunt’s algorithm
Adjusted
significance testing
Developer
Ross Quinlan
Bremman
Gordon Kass
When developed
1986
1984
1980
Types of trees
Classification
Classification and
Classification and
regression trees
regression
Serial
Tree growth and tree Tree growth and
Tree growth and
implementation
pruning
tree pruning
tree pruning
Type of data
Discrete and
Discrete and
Non-normal data
continuous;
continuous
also accepted
incomplete data
Types of splits
Multiway splits
Binary splits only;
Multiway splits
clever surrogate
as default
splits to reduce
tree depth
Splitting criteria
Information gain
Gini coefficient, and Chi-square test
others
Pruning criteria
Clever bottom-up
Remove weakest
Trees can become
technique avoids
links first
very large
overfitting
Implementation
Publicly available
Publicly available in Popular in market
most packages
research, for
segmentation
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
What's Done in Darkness by Kayla Perrin(26623)
The Fifty Shades Trilogy & Grey by E L James(19101)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19089)
Shot Through the Heart by Mercy Celeste(18956)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 10 by Isuna Hasekura and Jyuu Ayakura(17141)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17031)
Peren F. Statistics for Business and Economics...Essential Formulas 3ed 2025 by Unknown(16905)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 03 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16844)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 01 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16471)
The Subtle Art of Not Giving a F*ck by Mark Manson(14395)
The 3rd Cycle of the Betrayed Series Collection: Extremely Controversial Historical Thrillers (Betrayed Series Boxed set) by McCray Carolyn(14162)
Stepbrother Stories 2 - 21 Taboo Story Collection (Brother Sister Stepbrother Stepsister Taboo Pseudo Incest Family Virgin Creampie Pregnant Forced Pregnancy Breeding) by Roxi Harding(13686)
Scorched Earth by Nick Kyme(12790)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(11036)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(11000)
Drei Generationen auf dem Jakobsweg by Stein Pia(10986)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(10982)
Suna by Ziefle Pia(10906)
Scythe by Neal Shusterman(10375)
